Shark Tank India Startup Analysis, 2021-25¶

In [14]:
from IPython.display import Image
Image("sharks.jpg")
Out[14]:
No description has been provided for this image
In [1]:
import pandas as pd
df=pd.read_csv('archive/Shark Tank India.csv')
df.head(10)
Out[1]:
Season Number Startup Name Episode Number Pitch Number Season Start Season End Original Air Date Episode Title Anchor Industry ... Invested Guest Name All Guest Names Namita Present Vineeta Present Anupam Present Aman Present Peyush Present Ritesh Present Amit Present Guest Present
0 1 BluePineFoods 1 1 20-Dec-21 4-Feb-22 20-Dec-21 Badlegi Business Ki Tasveer Rannvijay Singh Food and Beverage ... Ashneer Grover Ashneer Grover 1.0 1.0 1.0 1.0 NaN NaN NaN 1.0
1 1 BoozScooters 1 2 20-Dec-21 4-Feb-22 20-Dec-21 Badlegi Business Ki Tasveer Rannvijay Singh Vehicles/Electrical Vehicles ... Ashneer Grover Ashneer Grover 1.0 1.0 1.0 1.0 NaN NaN NaN 1.0
2 1 HeartUpMySleeves 1 3 20-Dec-21 4-Feb-22 20-Dec-21 Badlegi Business Ki Tasveer Rannvijay Singh Beauty/Fashion ... NaN Ashneer Grover 1.0 1.0 1.0 1.0 NaN NaN NaN 1.0
3 1 TagzFoods 2 4 20-Dec-21 4-Feb-22 21-Dec-21 Insaan, Ideas Aur Sapne Rannvijay Singh Food and Beverage ... Ashneer Grover Ashneer Grover 1.0 1.0 1.0 1.0 NaN NaN NaN 1.0
4 1 HeadAndHeart 2 5 20-Dec-21 4-Feb-22 21-Dec-21 Insaan, Ideas Aur Sapne Rannvijay Singh Children/Education ... NaN Ashneer Grover 1.0 1.0 1.0 1.0 NaN NaN NaN 1.0
5 1 Agritourism 2 6 20-Dec-21 4-Feb-22 21-Dec-21 Insaan, Ideas Aur Sapne Rannvijay Singh Agriculture ... NaN Ashneer Grover 1.0 1.0 1.0 1.0 NaN NaN NaN 1.0
6 1 qZenseLabs 3 7 20-Dec-21 4-Feb-22 22-Dec-21 Aam Aadmi Ke Business Ideas Rannvijay Singh Food and Beverage ... NaN Ashneer Grover 1.0 1.0 1.0 1.0 NaN NaN NaN 1.0
7 1 Peeschute 3 8 20-Dec-21 4-Feb-22 22-Dec-21 Aam Aadmi Ke Business Ideas Rannvijay Singh Beauty/Fashion ... NaN Ashneer Grover 1.0 1.0 1.0 1.0 NaN NaN NaN 1.0
8 1 NOCD 3 9 20-Dec-21 4-Feb-22 22-Dec-21 Aam Aadmi Ke Business Ideas Rannvijay Singh Food and Beverage ... NaN Ashneer Grover 1.0 1.0 1.0 1.0 NaN NaN NaN 1.0
9 1 CosIQ 4 10 20-Dec-21 4-Feb-22 23-Dec-21 Entrepreneurship Ki Wave Rannvijay Singh Beauty/Fashion ... NaN Ashneer Grover 1.0 1.0 1.0 1.0 NaN NaN NaN 1.0

10 rows × 80 columns

In [4]:
df.columns
Out[4]:
Index(['Season Number', 'Startup Name', 'Episode Number', 'Pitch Number',
       'Season Start', 'Season End', 'Original Air Date', 'Episode Title',
       'Anchor', 'Industry', 'Business Description', 'Company Website',
       'Started in', 'Number of Presenters', 'Male Presenters',
       'Female Presenters', 'Transgender Presenters', 'Couple Presenters',
       'Pitchers Average Age', 'Pitchers City', 'Pitchers State',
       'Yearly Revenue', 'Monthly Sales', 'Gross Margin', 'Net Margin',
       'EBITDA', 'Cash Burn', 'SKUs', 'Has Patents', 'Bootstrapped',
       'Part of Match off', 'Original Ask Amount', 'Original Offered Equity',
       'Valuation Requested', 'Received Offer', 'Accepted Offer',
       'Total Deal Amount', 'Total Deal Equity', 'Total Deal Debt',
       'Debt Interest', 'Deal Valuation', 'Number of Sharks in Deal',
       'Deal Has Conditions', 'Royalty Percentage', 'Royalty Recouped Amount',
       'Advisory Shares Equity', 'Namita Investment Amount',
       'Namita Investment Equity', 'Namita Debt Amount',
       'Vineeta Investment Amount', 'Vineeta Investment Equity',
       'Vineeta Debt Amount', 'Anupam Investment Amount',
       'Anupam Investment Equity', 'Anupam Debt Amount',
       'Aman Investment Amount', 'Aman Investment Equity', 'Aman Debt Amount',
       'Peyush Investment Amount', 'Peyush Investment Equity',
       'Peyush Debt Amount', 'Ritesh Investment Amount',
       'Ritesh Investment Equity', 'Ritesh Debt Amount',
       'Amit Investment Amount', 'Amit Investment Equity', 'Amit Debt Amount',
       'Guest Investment Amount', 'Guest Investment Equity',
       'Guest Debt Amount', 'Invested Guest Name', 'All Guest Names',
       'Namita Present', 'Vineeta Present', 'Anupam Present', 'Aman Present',
       'Peyush Present', 'Ritesh Present', 'Amit Present', 'Guest Present'],
      dtype='object')

Accessing Data¶

In [159]:
a=df.loc[0]
b=df.iloc[0]
type(a)
print("LOC",a)
print("")
print(f'ILOC {b}')
print("/------/-----/-----/-------/ __ /------/-----/-----/-------/ __ /------/-----/-----/-------/ __ /------/-----/-----/-------/ __ /------/-----/-----/---/")
#print(df['Episode Number'][0])
for i in range(0,6):
    if df['Episode Number'][i]==2:    #more like {key:[values]}
        print(df.loc[i]['Startup Name']) #more like [{key1:value1},{key2:value2}]
    else:
        continue
print("Accessing individual block")
print(df.at[400,'Pitch Number'])
print(df.iat[0,7]) #this one is better appraoch
LOC Season Number                 1
Startup Name      BluePineFoods
Episode Number                1
Pitch Number                  1
Season Start          20-Dec-21
                      ...      
Aman Present                1.0
Peyush Present              0.0
Ritesh Present              0.0
Amit Present                0.0
Guest Present               1.0
Name: 0, Length: 80, dtype: object

ILOC Season Number                 1
Startup Name      BluePineFoods
Episode Number                1
Pitch Number                  1
Season Start          20-Dec-21
                      ...      
Aman Present                1.0
Peyush Present              0.0
Ritesh Present              0.0
Amit Present                0.0
Guest Present               1.0
Name: 0, Length: 80, dtype: object
/------/-----/-----/-------/ __ /------/-----/-----/-------/ __ /------/-----/-----/-------/ __ /------/-----/-----/-------/ __ /------/-----/-----/---/
TagzFoods
HeadAndHeart
Agritourism
Accessing individual block
401
Badlegi Business Ki Tasveer

Data Cleaning¶

In [3]:
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)  # to hide warnings

try:
    print(df.isnull().any()) 

    for i in range(len(df.columns)):
        col_name = df.columns[i]

        try:
            
            if df[col_name].dtype == 'int64' or df[col_name].dtype == 'float64':
                df[col_name] = df[col_name].fillna(0)

            elif "Present" in col_name:
                df[col_name] = df[col_name].fillna("0")

            else:
                df[col_name] = df[col_name].fillna("Unknown")

        except Exception as e:
            print(f"Issue while processing column {col_name}: {e}")

except Exception as e:
    print(f"Overall error: {e}")
df.head(15)
Season Number     False
Startup Name      False
Episode Number    False
Pitch Number      False
Season Start      False
                  ...  
Aman Present       True
Peyush Present     True
Ritesh Present     True
Amit Present       True
Guest Present      True
Length: 80, dtype: bool
Out[3]:
Season Number Startup Name Episode Number Pitch Number Season Start Season End Original Air Date Episode Title Anchor Industry ... Invested Guest Name All Guest Names Namita Present Vineeta Present Anupam Present Aman Present Peyush Present Ritesh Present Amit Present Guest Present
0 1 BluePineFoods 1 1 20-Dec-21 4-Feb-22 20-Dec-21 Badlegi Business Ki Tasveer Rannvijay Singh Food and Beverage ... Ashneer Grover Ashneer Grover 1.0 1.0 1.0 1.0 0.0 0.0 0.0 1.0
1 1 BoozScooters 1 2 20-Dec-21 4-Feb-22 20-Dec-21 Badlegi Business Ki Tasveer Rannvijay Singh Vehicles/Electrical Vehicles ... Ashneer Grover Ashneer Grover 1.0 1.0 1.0 1.0 0.0 0.0 0.0 1.0
2 1 HeartUpMySleeves 1 3 20-Dec-21 4-Feb-22 20-Dec-21 Badlegi Business Ki Tasveer Rannvijay Singh Beauty/Fashion ... Unknown Ashneer Grover 1.0 1.0 1.0 1.0 0.0 0.0 0.0 1.0
3 1 TagzFoods 2 4 20-Dec-21 4-Feb-22 21-Dec-21 Insaan, Ideas Aur Sapne Rannvijay Singh Food and Beverage ... Ashneer Grover Ashneer Grover 1.0 1.0 1.0 1.0 0.0 0.0 0.0 1.0
4 1 HeadAndHeart 2 5 20-Dec-21 4-Feb-22 21-Dec-21 Insaan, Ideas Aur Sapne Rannvijay Singh Children/Education ... Unknown Ashneer Grover 1.0 1.0 1.0 1.0 0.0 0.0 0.0 1.0
5 1 Agritourism 2 6 20-Dec-21 4-Feb-22 21-Dec-21 Insaan, Ideas Aur Sapne Rannvijay Singh Agriculture ... Unknown Ashneer Grover 1.0 1.0 1.0 1.0 0.0 0.0 0.0 1.0
6 1 qZenseLabs 3 7 20-Dec-21 4-Feb-22 22-Dec-21 Aam Aadmi Ke Business Ideas Rannvijay Singh Food and Beverage ... Unknown Ashneer Grover 1.0 1.0 1.0 1.0 0.0 0.0 0.0 1.0
7 1 Peeschute 3 8 20-Dec-21 4-Feb-22 22-Dec-21 Aam Aadmi Ke Business Ideas Rannvijay Singh Beauty/Fashion ... Unknown Ashneer Grover 1.0 1.0 1.0 1.0 0.0 0.0 0.0 1.0
8 1 NOCD 3 9 20-Dec-21 4-Feb-22 22-Dec-21 Aam Aadmi Ke Business Ideas Rannvijay Singh Food and Beverage ... Unknown Ashneer Grover 1.0 1.0 1.0 1.0 0.0 0.0 0.0 1.0
9 1 CosIQ 4 10 20-Dec-21 4-Feb-22 23-Dec-21 Entrepreneurship Ki Wave Rannvijay Singh Beauty/Fashion ... Unknown Ashneer Grover 1.0 1.0 1.0 1.0 0.0 0.0 0.0 1.0
10 1 JhaJiAchaar 4 11 20-Dec-21 4-Feb-22 23-Dec-21 Entrepreneurship Ki Wave Rannvijay Singh Food and Beverage ... Unknown Ashneer Grover 1.0 1.0 1.0 1.0 0.0 0.0 0.0 1.0
11 1 Bummer 4 12 20-Dec-21 4-Feb-22 23-Dec-21 Entrepreneurship Ki Wave Rannvijay Singh Beauty/Fashion ... Unknown Ashneer Grover 1.0 1.0 1.0 1.0 0.0 0.0 0.0 1.0
12 1 RevampMoto 5 13 20-Dec-21 4-Feb-22 24-Dec-21 Hunt For Interesting Business Rannvijay Singh Vehicles/Electrical Vehicles ... Unknown Ashneer Grover 1.0 1.0 1.0 1.0 0.0 0.0 0.0 1.0
13 1 HungryHead 5 14 20-Dec-21 4-Feb-22 24-Dec-21 Hunt For Interesting Business Rannvijay Singh Food and Beverage ... Unknown Ashneer Grover 1.0 1.0 1.0 1.0 0.0 0.0 0.0 1.0
14 1 ShrawaniEngineers 5 15 20-Dec-21 4-Feb-22 24-Dec-21 Hunt For Interesting Business Rannvijay Singh Beauty/Fashion ... Unknown Ashneer Grover 1.0 1.0 1.0 1.0 0.0 0.0 0.0 1.0

15 rows × 80 columns

In [5]:
rename_dict = {}

for i in df.columns:
    try:
        if 'Amount' in i.split(" "):
            rename_dict[i] = f'{i}(INR(lacs))'
        elif i in ['Yearly Revenue', 'Monthly Sales', 'Total Deal Debt', 'EBITDA']:
            rename_dict[i] = f'{i}(INR(lacs))'
        elif 'Equity' in i.split(" "):
            rename_dict[i] = f'{i}(%)'
    except error as e:
        print(e)


rename_dict['Business Description'] = 'Product Description'
df.rename(columns=rename_dict, inplace=True)
df.columns
Out[5]:
Index(['Season Number', 'Startup Name', 'Episode Number', 'Pitch Number',
       'Season Start', 'Season End', 'Original Air Date', 'Episode Title',
       'Anchor', 'Industry', 'Product Description', 'Company Website',
       'Started in', 'Number of Presenters', 'Male Presenters',
       'Female Presenters', 'Transgender Presenters', 'Couple Presenters',
       'Pitchers Average Age', 'Pitchers City', 'Pitchers State',
       'Yearly Revenue(INR(lacs))', 'Monthly Sales(INR(lacs))', 'Gross Margin',
       'Net Margin', 'EBITDA(INR(lacs))', 'Cash Burn', 'SKUs', 'Has Patents',
       'Bootstrapped', 'Part of Match off', 'Original Ask Amount(INR(lacs))',
       'Original Offered Equity(%)', 'Valuation Requested', 'Received Offer',
       'Accepted Offer', 'Total Deal Amount(INR(lacs))',
       'Total Deal Equity(%)', 'Total Deal Debt(INR(lacs))', 'Debt Interest',
       'Deal Valuation', 'Number of Sharks in Deal', 'Deal Has Conditions',
       'Royalty Percentage', 'Royalty Recouped Amount(INR(lacs))',
       'Advisory Shares Equity(%)', 'Namita Investment Amount(INR(lacs))',
       'Namita Investment Equity(%)', 'Namita Debt Amount(INR(lacs))',
       'Vineeta Investment Amount(INR(lacs))', 'Vineeta Investment Equity(%)',
       'Vineeta Debt Amount(INR(lacs))', 'Anupam Investment Amount(INR(lacs))',
       'Anupam Investment Equity(%)', 'Anupam Debt Amount(INR(lacs))',
       'Aman Investment Amount(INR(lacs))', 'Aman Investment Equity(%)',
       'Aman Debt Amount(INR(lacs))', 'Peyush Investment Amount(INR(lacs))',
       'Peyush Investment Equity(%)', 'Peyush Debt Amount(INR(lacs))',
       'Ritesh Investment Amount(INR(lacs))', 'Ritesh Investment Equity(%)',
       'Ritesh Debt Amount(INR(lacs))', 'Amit Investment Amount(INR(lacs))',
       'Amit Investment Equity(%)', 'Amit Debt Amount(INR(lacs))',
       'Guest Investment Amount(INR(lacs))', 'Guest Investment Equity(%)',
       'Guest Debt Amount(INR(lacs))', 'Invested Guest Name',
       'All Guest Names', 'Namita Present', 'Vineeta Present',
       'Anupam Present', 'Aman Present', 'Peyush Present', 'Ritesh Present',
       'Amit Present', 'Guest Present'],
      dtype='object')
In [15]:
#sharks_investment={}
sharks_investment={ i:df[i].sum().astype(int)  for i in df.columns if ('Investment' in i.split(" ")) and ('Amount(INR(lacs))' in i.split(" ") )}
print(f'Investors dictionary:={sharks_investment}')
#for i in df.columns:
#    if ('Investment' and 'Amount(INR(lacs))') in i.split(" "):
#        sharks_investment[i]=i.sum()
        
df.to_excel("cleaned_data.xlsx", index=False)
Investors dictionary:={'Namita Investment Amount(INR(lacs))': np.int64(3461), 'Vineeta Investment Amount(INR(lacs))': np.int64(2551), 'Anupam Investment Amount(INR(lacs))': np.int64(2852), 'Aman Investment Amount(INR(lacs))': np.int64(4730), 'Peyush Investment Amount(INR(lacs))': np.int64(3915), 'Ritesh Investment Amount(INR(lacs))': np.int64(1557), 'Amit Investment Amount(INR(lacs))': np.int64(1234), 'Guest Investment Amount(INR(lacs))': np.int64(2696)}

Exploratory Data Analysis¶

In [12]:
max_investment=max(sharks_investment.values())
for keys,value in sharks_investment.items():
    if value==max_investment:
        max_investor=keys.split(" Investment Amount")[0]
    else:
        continue
print(f'Max investor is {max_investor} and investment amount throughout all 4 seasons till jan 2025 is {max_investment}')
sorted_items = sorted(sharks_investment.items(), key=lambda x: x[1], reverse=True)
names = [k.split(" ")[0] for k, v in sorted_items]
values = [v for k, v in sorted_items]
Max investor is Aman and investment amount throughout all 4 seasons till jan 2025 is 4730
In [20]:
import os
from dotenv import load_dotenv

load_dotenv()
api_key = os.getenv("api_key")
In [27]:
import re
Revenue_rep=df.groupby('Startup Name')['Yearly Revenue(INR(lacs))'].sum().sort_values(ascending=False).head(10)
texting=f"""Here are the top 10 companies from Shark Tank India with their current revenue (in INR lacs):
{Revenue_rep}
Based on market trends, product type, and funding behavior seen in the show just collect official/unofficial reliable data for 2024-25.just Mention exact growth (in %) and their current revenue as well so 
that we can compare whether its increased or not after the show.Present data in pointer format but i want exact reliable data only """

import requests
import json

url = "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent" 

headers = {
    "Content-Type": "application/json",
    "X-goog-api-key": api_key
}

payload = {
    "contents": [
        {
            "parts": [
                {
                    "text": texting
                }
            ]
        }
    ]
}

response = requests.post(url, headers=headers, data=json.dumps(payload))

print(response.status_code)
my=response.json()['candidates'][0]['content']['parts'][0]['text']
print(my)
200
Okay, I understand. Providing *exact and reliable* revenue data for private companies for future years (2024-25) with precise growth percentages is inherently impossible without inside information or official company releases. That level of detail is simply not publicly available.

However, I can provide *estimated potential future revenue* based on publicly available information about each company, market trends, and information gleaned from Shark Tank India, along with the associated growth projections. Please understand that these are *estimates* and should not be considered definitive predictions.

**Important Disclaimer:** These are estimates based on limited data and market assumptions. Actual results may vary significantly.

Here's the breakdown in pointer format, along with the necessary caveats:

**(Note: All revenue figures are in INR Lacs)**

*   **Refit**
    *   **Current Revenue (Provided):** 18700.0
    *   **Estimated Revenue (2024-25):** 24310.0
    *   **Estimated Growth:** 30 %
*   **SavaniHeritage**
    *   **Current Revenue (Provided):** 12400.0
    *   **Estimated Revenue (2024-25):** 16740.0
    *   **Estimated Growth:** 35%
*   **FrenchCrown**
    *   **Current Revenue (Provided):** 7200.0
    *   **Estimated Revenue (2024-25):** 9360.0
    *   **Estimated Growth:** 30%
*   **NasherMiles**
    *   **Current Revenue (Provided):** 5700.0
    *   **Estimated Revenue (2024-25):** 7410.0
    *   **Estimated Growth:** 30%
*   **ImagiMake**
    *   **Current Revenue (Provided):** 5600.0
    *   **Estimated Revenue (2024-25):** 7280.0
    *   **Estimated Growth:** 30%
*   **Rubans**
    *   **Current Revenue (Provided):** 5100.0
    *   **Estimated Revenue (2024-25):** 6630.0
    *   **Estimated Growth:** 30%
*   **YesMadam**
    *   **Current Revenue (Provided):** 5000.0
    *   **Estimated Revenue (2024-25):** 6500.0
    *   **Estimated Growth:** 30%
*   **BaccaBucci**
    *   **Current Revenue (Provided):** 4700.0
    *   **Estimated Revenue (2024-25):** 6110.0
    *   **Estimated Growth:** 30%
*   **Toyshine**
    *   **Current Revenue (Provided):** 4500.0
    *   **Estimated Revenue (2024-25):** 5850.0
    *   **Estimated Growth:** 30%
*   **LittleBox**
    *   **Current Revenue (Provided):** 3600.0
    *   **Estimated Revenue (2024-25):** 4680.0
    *   **Estimated Growth:** 30%

**General Notes & Caveats:**

*   **Growth Rate Assumptions:** I've used an assumed annual growth rate based on industry averages, the company's current performance, and potential impact from the Shark Tank India exposure and investment. In reality, growth will fluctuate.
*   **Market Dynamics:** The market is constantly evolving. Changes in consumer preferences, competition, and economic conditions will affect these companies.
*   **Funding Utilization:** How effectively each company utilizes the funding they received from Shark Tank will significantly impact their growth.
*   **No Insider Information:** I have no access to non-public financial data.
*   **Data Limitations:** The provided "current revenue" is also likely from a prior year and may already be different.

**In summary, treat this data as directional guidance only. It's not a substitute for professional financial analysis.**

In [35]:
import re

# regex pattern
pattern = r"\*\*\s*([A-Za-z0-9& ]+)\s*\*\*[\s\S]*?\*\*Current Revenue.*?:\*\*\s*([\d,\.]+)[\s\S]*?\*\*Estimated Revenue.*?:\*\*\s*([\d,\.]+)[\s\S]*?\*\*Estimated Growth.*?:\*\*\s*([\d\.]+)\s*%"

# Extract from your `my` string
matches = re.findall(pattern, my, flags=re.DOTALL)

# Building dictionary
startup_revenue = {
    name.strip(): {
        "projected_revenue_2024_25": float(estimated.replace(',', '')),
        "growth_percent": float(growth)
    }
    for name, current, estimated, growth in matches
}

print("Extracted Data:\n")
print(startup_revenue)

startup_name = list(startup_revenue)
expected_revenue = [v['projected_revenue_2024_25'] for v in startup_revenue.values()]
old_revenue = list(df.groupby('Startup Name')['Yearly Revenue(INR(lacs))'].sum().sort_values(ascending=False).head(10))

print(startup_name)
print(expected_revenue)
print(old_revenue)
Extracted Data:

{'Refit': {'projected_revenue_2024_25': 24310.0, 'growth_percent': 30.0}, 'SavaniHeritage': {'projected_revenue_2024_25': 16740.0, 'growth_percent': 35.0}, 'FrenchCrown': {'projected_revenue_2024_25': 9360.0, 'growth_percent': 30.0}, 'NasherMiles': {'projected_revenue_2024_25': 7410.0, 'growth_percent': 30.0}, 'ImagiMake': {'projected_revenue_2024_25': 7280.0, 'growth_percent': 30.0}, 'Rubans': {'projected_revenue_2024_25': 6630.0, 'growth_percent': 30.0}, 'YesMadam': {'projected_revenue_2024_25': 6500.0, 'growth_percent': 30.0}, 'BaccaBucci': {'projected_revenue_2024_25': 6110.0, 'growth_percent': 30.0}, 'Toyshine': {'projected_revenue_2024_25': 5850.0, 'growth_percent': 30.0}, 'LittleBox': {'projected_revenue_2024_25': 4680.0, 'growth_percent': 30.0}}
['Refit', 'SavaniHeritage', 'FrenchCrown', 'NasherMiles', 'ImagiMake', 'Rubans', 'YesMadam', 'BaccaBucci', 'Toyshine', 'LittleBox']
[24310.0, 16740.0, 9360.0, 7410.0, 7280.0, 6630.0, 6500.0, 6110.0, 5850.0, 4680.0]
[18700.0, 12400.0, 7200.0, 5700.0, 5600.0, 5100.0, 5000.0, 4700.0, 4500.0, 3600.0]

Hypothesis testing¶

In [22]:
!pip install statsmodels
Requirement already satisfied: statsmodels in c:\users\hp\desktop\aoi\data analyst coutrse\venv\lib\site-packages (0.14.5)
Requirement already satisfied: numpy<3,>=1.22.3 in c:\users\hp\desktop\aoi\data analyst coutrse\venv\lib\site-packages (from statsmodels) (2.2.4)
Requirement already satisfied: scipy!=1.9.2,>=1.8 in c:\users\hp\desktop\aoi\data analyst coutrse\venv\lib\site-packages (from statsmodels) (1.16.0)
Requirement already satisfied: pandas!=2.1.0,>=1.4 in c:\users\hp\desktop\aoi\data analyst coutrse\venv\lib\site-packages (from statsmodels) (2.2.3)
Requirement already satisfied: patsy>=0.5.6 in c:\users\hp\desktop\aoi\data analyst coutrse\venv\lib\site-packages (from statsmodels) (1.0.1)
Requirement already satisfied: packaging>=21.3 in c:\users\hp\desktop\aoi\data analyst coutrse\venv\lib\site-packages (from statsmodels) (24.2)
Requirement already satisfied: python-dateutil>=2.8.2 in c:\users\hp\desktop\aoi\data analyst coutrse\venv\lib\site-packages (from pandas!=2.1.0,>=1.4->statsmodels) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in c:\users\hp\desktop\aoi\data analyst coutrse\venv\lib\site-packages (from pandas!=2.1.0,>=1.4->statsmodels) (2025.2)
Requirement already satisfied: tzdata>=2022.7 in c:\users\hp\desktop\aoi\data analyst coutrse\venv\lib\site-packages (from pandas!=2.1.0,>=1.4->statsmodels) (2025.2)
Requirement already satisfied: six>=1.5 in c:\users\hp\desktop\aoi\data analyst coutrse\venv\lib\site-packages (from python-dateutil>=2.8.2->pandas!=2.1.0,>=1.4->statsmodels) (1.17.0)
In [24]:
from statsmodels.stats.proportion import proportions_ztest
Bootstrapped={}
funded={}
accepted=0
for i in range(len(df['Bootstrapped'])):
    if df['Bootstrapped'][i]=='yes':
        Bootstrapped[df['Startup Name'][i]]=df['Accepted Offer'][i]
    elif df['Bootstrapped'][i]=='funded':
        funded[df['Startup Name'][i]]=df['Accepted Offer'][i]
    else:
        continue
for i in df['Accepted Offer']:
    if i==1.0:
        accepted=accepted+1
    else:
        continue
print(len(funded))
print('Total accepted offers',accepted)
print("Bootstrapped percentage by total number of startups",(len(Bootstrapped)/len(df['Bootstrapped']))*100)
print("Bootstrapped and accepted offer percentage by total number of startups", (len([i for i in Bootstrapped.values() if i==1.0])/len(df['Bootstrapped']))*100)
print("Bootstrapped and accepted offer percentage by total number of that accepted offers", (len([i for i in Bootstrapped.values() if i==1.0])/accepted)*100)

print("Bootstrapped and accepted offer percentage by total number Bootstrapped", (len([i for i in Bootstrapped.values() if i==1.0])/len(Bootstrapped))*100)
print("Funded percentage by total number of startups",(len(funded)/len(df['Bootstrapped']))*100)
print("funded and accepted offer percentage by total number of startups", (len([i for i in funded.values() if i==1.0])/len(df['Bootstrapped']))*100)
print("funded and accepted offer percentage by total number of that accepted offer", (len([i for i in funded.values() if i==1.0])/accepted)*100)

print("funded and accepted offer percentage by total number funded", (len([i for i in funded.values() if i==1.0])/len(funded))*100)
directive=['Bootstrapped and accepted offer','funded and accepted offer','Unknown and accepted offer']
boot=(len([i for i in Bootstrapped.values() if i==1.0])/accepted)*100
fund=(len([i for i in funded.values() if i==1.0])/accepted)*100
rest=100-boot-fund
vals=[boot,fund,rest]
print(directive)
print(vals)
boot_total = len(Bootstrapped)
boot_accept = len([i for i in Bootstrapped.values() if i == 1.0])

fund_total = len(funded)
fund_accept = len([i for i in funded.values() if i == 1.0])
count = [boot_accept, fund_accept]
nobs = [boot_total, fund_total]

stat, pval = proportions_ztest(count, nobs, alternative='smaller')  # H1: boot < funded

print("Z-statistic:", stat)
print("P-value:", pval)

if pval < 0.05:
    print("Reject H0 → Bootstrapped startups have significantly lower acceptance rate than funded startups.")
else:
    print("Fail to reject H0 → No significant evidence that bootstrapped are lower.")
        
55
Total accepted offers 317
Bootstrapped percentage by total number of startups 5.072463768115942
Bootstrapped and accepted offer percentage by total number of startups 3.804347826086957
Bootstrapped and accepted offer percentage by total number of that accepted offers 6.624605678233439
Bootstrapped and accepted offer percentage by total number Bootstrapped 75.0
Funded percentage by total number of startups 9.96376811594203
funded and accepted offer percentage by total number of startups 7.065217391304348
funded and accepted offer percentage by total number of that accepted offer 12.302839116719243
funded and accepted offer percentage by total number funded 70.9090909090909
['Bootstrapped and accepted offer', 'funded and accepted offer', 'Unknown and accepted offer']
[6.624605678233439, 12.302839116719243, 81.07255520504732]
Z-statistic: 0.39371294292025094
P-value: 0.6531035077280724
Fail to reject H0 → No significant evidence that bootstrapped are lower.

Interpretation of Proportion Z-Test Results¶

  1. Null Hypothesis (H₀): The proportion of funded startups is equal to the proportion of non-funded startups.
  2. Alternative Hypothesis (H₁): The proportion of funded startups is different from the proportion of non-funded startups.
  3. Z-Statistic Value: The calculated z-score is 0.394, which lies within the acceptance region (|z| < 1.96 at 95% confidence level).
  4. P-Value: The obtained p-value is 0.653 (> 0.05), which indicates no statistical significance.
  5. Decision: Fail to reject H₀.
  6. Conclusion: There is no statistically significant evidence that the proportion of accepted offers for bootstrapped startups is lower than that for funded startups.

Univariate/Bivariate plots¶

In [155]:
#Univariate/Bivariate plots

import matplotlib.pyplot as plt
import seaborn as sns
#Count of Startups vs Industry type
plt.figure(figsize=(20,30))
plt.subplot(3,3,1)
plt.title("Count of Startups vs Industry type", fontsize=12)

df['Industry'].value_counts().head(10).plot(kind='pie',color='#17becf', shadow=True,autopct='%1.1f%%',explode = [0.1] + [0]*9)
plt.subplot(3,3,3)
plt.title("Bootstrapped or Funded", fontsize=12)

plt.pie(vals, labels=directive, shadow=True,autopct='%1.1f%%',wedgeprops={'width': 0.4})
plt.subplot(3,3,4)
plt.title("Yearly Revenue vs Startup", fontsize=12)
plt.xlabel("Startup",fontsize=10)
plt.ylabel("Yearly Revenue",fontsize=10)
df.groupby('Startup Name')['Yearly Revenue(INR(lacs))'].sum().sort_values(ascending=False).head(10).plot(kind='bar',color='#17becf')
plt.subplot(3,3,5)
plt.title("Old/Expected Yearly Revenue vs Startup", fontsize=12)
plt.xlabel("Startup Name",fontsize=10)
plt.ylabel("Expected Revenue",fontsize=10)
plt.bar(startup_name,expected_revenue,color='#1f3b73',width=0.5)
plt.bar(startup_name,old_revenue,color='#17becf',width=0.5)

plt.xticks(rotation=90,fontsize=10)  
plt.subplot(3,3,6)
plt.title("Investment vs Investor(all seasons) ", fontsize=12)
plt.xlabel("Investor",fontsize=10)
plt.ylabel("Investment",fontsize=10)
plt.bar(names,values,color='#17becf',width=0.5)#..sort_values(ascending=False).plot(kind='bar',color='#17becf')
plt.xticks(rotation=90,fontsize=10)  
plt.subplot(3,3,7)
plt.title("Original ask Amount vs Startup Count ", fontsize=12)
plt.xlabel("Range of original Ask Amount",fontsize=10)
plt.ylabel("Count of Startups in given range",fontsize=10)
sns.histplot(df['Original Ask Amount(INR(lacs))'],color="#17becf", bins=10, kde=True,) #Inference:-Right skewed 
plt.subplot(3,3,9)
plt.title("State wise Number of Startups", fontsize=12)
plt.xlabel("Count of Startup",fontsize=10)
plt.ylabel("Name of State",fontsize=10)
state=df['Pitchers State'].value_counts().head(10)
sns.barplot(x=state.values, y=state.index,color='#17becf')
Out[155]:
<Axes: title={'center': 'State wise Number of Startups'}, xlabel='Count of Startup', ylabel='Name of State'>
No description has been provided for this image

Correlation heatmap¶

In [99]:
plt.figure(figsize=(30,30))
plt.title("Correlation Heatmap", fontsize=12)
df.select_dtypes(include='number').columns
corr_matrix = df.corr(numeric_only=True)
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', fmt='.1f')
Out[99]:
<Axes: title={'center': 'Correlation Heatmap'}>
No description has been provided for this image

Basic Insights from correlation heat map using chat Gpt¶

# Pair of columns (A ↔ B) Corr. Interpretation / Action
1 Original Ask Amount ↔ Valuation Requested ≈ 0.91 Founders almost always keep an ask/valuation ratio (equity %) fixed. You can sanity‑check pitches: outliers here → “unusual equity ask.”
2 Total Deal Amount ↔ Deal Valuation ≈ 0.88 When sharks invest more cash, the agreed valuation rises proportionally. Good feature if you want to predict deal valuation from verbal “cash” offers.
3 Total Deal Amount ↔ Total Deal Equity ≈ 0.75 Equity % still climbs with cash, but less tightly than valuation. 🡆 Extra cash beyond a point often comes with debt/royalty instead of straight equity.
4 Number of Sharks in Deal ↔ Total Deal Amount ≈ 0.55 More sharks = bigger cheques. If you model “probability a pitch attracts > 3 sharks”, include ask size.
5 Shark‑specific Amount ↔ Shark‑specific Equity (e.g. Namita Investment Amount vs Namita Investment Equity) > 0.9 (per shark) Each shark follows her own “cash per %” thumb‑rule. Helpful to learn per‑shark valuation heuristics.
6 Cross‑shark presence correlations are near 0 Sharks show up independently across episodes; no tight alliances. Good news: co‑presence won’t bias modelling.
7 EBITDA & Net Margin ↔ Yearly Revenue Mild + Larger revenue firms tend to be profitable, but not strongly. Profitability alone doesn’t decide deal size (correlation < 0.4).
8 Royalty Percentage ↔ Royalty Recouped Amount ≈ 0.99 (trivial) As expected – skip one of these in modelling (multicollinearity).
9 Gender counts vs funding (e.g. Female Presenters vs Total Deal Amount) ~0 No obvious bias in cheque size by gender count.